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(54) TOle: MULTIPLE STEP IDENTMCATION OF RECORDINGS 

n 

^ (57) Abstract: Multiple information is extracted from an unknown recording and information associated therewidi. Associated 
information includes the filename, if the recording is a computer file in, e.g., MP3 format, or table of contents (TOC) data, if the 
2 recording is on a removable medium, such as a compact disc. At least one and preferably several algorithmically determined fin- 
gerprints are extracted from the recording using one or more fingerprint extraction methods, llie information extracted is compared 
^ with corresponding information in a database maintained for reference recordings. Identification starts with the most accurate and 
^ efGcient method available, e.g., using a hash ID, a unique ID or text. Fingerprint matching is used to confirm other matches and 
^ validation is performed by comparing the duration of the unknown and a possibly matching reference recording. 
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TITLE OF THE INVENTION 

MULTIPLE STEP IDENTIFICATION OF RECORDINGS 

CROSS-REFERENCE TO RELATED APPLICATION(S) 

[00011 This application is related to and claims priority to U.S. provisional application entitled 
DIGITAL MUSIC MULTIPLE STEP IDENTIFICATION METHOD AND SYSTEM having serial 
number 60/308,594, by Dale T. Roberts, et al.. filed July 31. 2001. and incorporated by 
reference herein. 

BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

[0002] The present invention is directed to recognition of recordings from their content, and, 
more partlculariy to combining fingerprint recognition with other information about a recording to 
increase reliability of recognition and to accomplish reliable recognition efficiently by using the 
least expensive forms of recognition first and layering on more complex forms as needed. 

2. Description ofthe Related Art 

[0003] There are many uses for recognition of audio (and video) recordings. Many ofthe uses 
relate to compensation or control by the rights holders for reproduction and perfomnance of the 
wori<s recorded. This use of such systems has increased in impoHance since the development 
of file sharing software, such as Napster, and the many other similar services available at the 
end of the twentieth century and the beginning of the twenty first century. Although the need for 
accurate recognition has been significant for several years, no system has been successful in 
meeting this need. 

[0004] Another use of recording recognition is to provide added value to users when listening 
(or watching) recordings. One example is the CDDB Music Recognition Service from 
Gracenote, Inc. of Berkeley, Califomia which recognizes compact discs (CDs) and supplies 
information regarding a recognized CD, such as album name, artist, track names and access to 
related content on the Intemet. including album covers, artist and fan websites, etc. While the 
CDDB service is effective for recognizing compact discs, there are several draw backs in using 
it to recognize files that are not stored on a removable disc, such as CD or DVD. 

[0005] All audio fingerprinting techniques have "blind spots", places where a system using that 
technique sees similarities and differences in audio where it shouldn't. By relying on just one 
fingerprinting technique, single source solutions are less accurate when encountering a 'blind 
spot'. 
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[0006] One of the more popular uses for the Gracenote CDDB system Is In applications that 
digitally encode audio files Into MPS and other formats. These encoding applications utilize 
Gracenote's CDDB service to recognize the compact disc being encoded and to write the cor- 
rect metadata into the title and ID tags. Gracenote's CDDB senrice returns a unique ID (TUID) 
for each track and supports the insertion of such IDs in the ID3V2 tags for MP3 files. The TUID 
is t)oth hashed and proprietary, and can only be read by the Gracenote system. However, the 
ID3V2 tags can easily be manipulated to store a TUID for one file in the ID3V2 tag for another 
file and therefore, the TUID alone is not a reliable identifier of the audio content in a file. 

[0007] Gracenote's CDDB service also provides text matching capability that can be utilized to 
identify digital audio files from their file names, file paths. ID tags (titles), etc. by matching the 
text extracted by a client device to a metadata database of track, artist, and album names. 
Although this text matching utilizes user-generated spelling variants associated with each record 
to Improve recognition, there has been no way to verity that the text matches the audio content 
of the recording once the recording has been separated from a compact disc and stored in a file 
in any fomnat 

SUMMARY OF THE INVENTION 

[0008] An aspect of the present invention is maximizing identific^ion of recordings while 
minimizing resource usage. 

[0009] Another aspect of the present invention is using multiple identification methods so that 
resource intensive methods, such as audio fingerprinting, are employed only when necessary. 

[001 0] A further aspect of the invention is minimization of processing of unidentified data. 

[0011] Yet another aspect of the present invention is to use the least expensive recognition 
technique, with progressively more expensive recognition techniques layered onto the process 
until a desired confidence level is reached. 

[0012] A still further aspect of the invention is validation of content-based identification of a 
recording by comparing text associated with an unidentified recording and text associated witii 
Identification records. 

[0013] Yet another aspect of the present invention is use of recording identification metiiods 
from different sources to increase reliabilily. 

[0014] A still further aspect of the invention is validation of content-based recording identifica- 
tion using fuzzy ti^ck length analysis. 
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[0015] Yet another aspect of the invention is automatic extraction of identification data for use in 
a reference database and for identlflcation of recordings. 

[00161 A still further aspect of the invention is that unidentified recordings are periodically re-run 
through the system to detennine if recently added data or recently improved techniques will 
result in recognition*. 

[00171 The above aspects can be attained by a method of identifying recordings by extracUng 
information about an unknown recording stored in media possessed by a user and at least one 
algorithmically determined fingerprint from at least one portion of the unknown recording; 
determining a possible identification of the unknown recording using at least one piece of the 
information extracted from the unknown recording and an identification database of 
corresponding infonnation for reference recordings; and identifying the unknown recording when 
the possible identification based on each of the at least one piece of the information in 
combination with the at least one algorithmically determined fingerprint identifies a single 
reference recording with respective confidence levels. The at least one portion of the unknown 
recording Inay contain audio, video or both. 

[00181 Preferably, the database is maintained by a provider of identification services which 
supplies unique identifiers that can be recognized only by servers' under the control of the 
provider of identification services. The unique identifiers are associated with recordings once 
they have been identified. Subsequently, copies of the recordings are recognized using the 
unique identifiers to greatly speed up the process. The unique identifiers optionally are cached 
in high-speed RAM or spedally'indexed database tables, 

[00191 When non-waveform data is not available for an unknown recording, the unknown 
recording is preferably identified by extracting fingerprints from at least one portion of the 
unknown recording using a plurality of algorithms; detennining a possible identification of the 
unknown recording using at least two of the fingerprints extracted from the unknown recording 
and at least one database of con^espondlngly generated fingerprints for reference recordings: 
and identifying the unknown recording when the possible identification based on each of the 
fingerprints identifies a single reference recording with respective confidence levels. 

[00201 Preferably, an existing database, used to identify recordings possessed by users, which 
does not contain fingerprint information is expanded by obtaining non-waveform data associated 
with a recording possessed by a user of the database; extracting at least one fingerprint from at 
least one portion of the recording; and storing the at least one fingerprint as identifying 
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information for the recording, when a match is found in the database for the non-waveform data. 
One example is that during the process of encoding digital music files from an audio CD 
possessed by a user, a recognition system can be used to identily the audio CD so that 
fingerprints extracted during the encoding process can be directly associated with the audio CD 
using a unique ID system. 

[0021] Recognition of recordings using either fingerprints or unique identifiers is preferably 
validated by other information maintained in the identification database, such as the length of 
the recording or a numeric identifier embedded within the recording. Infonnatlon about 
recordings that do not pass validation or match some, but not all of the infomnation used for 
identification, may be stored for later analysis of the reason for the error. If the fingerprints are 
obtained as described above, there may have been an ennor in obtaining the fingerprint 
Therefore, errors may be output to an operator, or the system could correct the infonnation 
stored in the database, based on recognition of patterns in the infomiation that is stored for 
improper matches. For example, If a large percentage of matching fingerprints are stored, but 
the other infomiation consistently does not match them, there could be an error in the fingerprint 
database which needs to be flagged to an operator. 

[0022] The present invention includes a system for identifying retordings that ini:;ludes an 
extraction unit to extract information about an unl<nown recording stored in media possessed by 
a user and at least one algorithmically detemnined fingerprint from at least one portion of the 
uni<nown recording; and an identification unit, coupled to the extraction unit, to make a possible 
identification of the unloiown recording using at least one piece of the infomnation extracted fi^om 
the unknown recording and an identification database of corresponding information for 
reference recordings, and to identily the unknown recording when the possible Identification 
based on each of the at least one piece of the infomiation in combination with the at least one 
algorithmically detemnined fingerprint identifies a single reference recording with respective 
confidence levels. 

[0023] The present invention also includes a system for identifying recordings that includes an 
extraction unit to extract fingerprints from at least one portion of an unknown recording using a 
plurality of algorithms, and an identification unit, coupled to said extraction unit, to make a pos- 
sible identification of the unknown recording using at least two of the fingerprints extracted fttsm 
the unknown recording and at least one database of conrespondingly generated fingerprints for 
reference recordings, and to identify the unknown recording when the possible identification 
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based on each of the fingerprints identifies a single reference recording with respective 
confidence levels. 

[0024] In either of the systems described above, the extraction unit is typically a client unit 
connected by a networi<, such as the Internet, to at least one server as the identification unit 
The client device may be a personal computer with a drive accessing the recording, a consumer 
electronics device with a networic connection, or a server computer transmitting the unknown 
recording from one location to another. Furthemiore, a portion of the database may be 
available locally and the extraction unit and identification unit may reside in the same device and 
share components. 

[0025] The present invention also includes a system for obtaining reference infonnation stored 
in a database used to identify unknown recordings, including a receiving unit to obtain non- 
waveform data associated with a recording possessed by a user of the database for 
identification of recordings possessed by the user, an extraction unit to extract at least one 
fingerprint from at least one portion of the recording; and a storage unit, coupled to said 
receiving unit and said extraction unit, to store tiie at least one fingerprint as kJentifying 
information for the recording, when a match is found in the database for the non-wavefonn data. 

[0026] These together witii other aspects and advantages which will be subsequentiy apparent, 
reside in \he details of construction and operation as more fully hereinafter described and 
claimed, reference being had to the accompanying drawings fomiing a part hereof, wherein like 
numerals refer to like parts throughout. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a functional block diagram of a system according to the present invention. 
Figure 2 is flowchart of a fingerprint extraction according to the present invention. 
Figure 3 is a flowchart of a method of recognizing unknown recordings. 
Figures 4A-4C are a block diagram of a system according to tiie present invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

[0027] According to the present invention, a suite of identification components are provided in a 
system like ttiat illustrated in Fig. 1 to facilitate analysis and identification of audio (and video) 
files utilizing multiple methods. Preferably, an existing database 90 containing recording 
identifiers and text data is combined with text-based digital audio and audio fingerprinting 
identification methods. Preferably, tine text data in database 90 is obtained from user 
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submissions and includes user-submitted spelling variants. One such database is available as 
the CDDB iWusic Recognition Service from Gracenote. Inc.. 

[0028] As illustrated in Fig. 1, a recording 100 is accessed by client device 110 via any 
conventional method, such as reading a digital audio file from a hard drive or a compact disc. 
Infomiation is extracted from reconding 100 and associated infomiation (metadata). Fingerprints 
are extracted from recording 100. as described in more detail below. The infomiation that is 
extracted from the metadata includes the duration of the recording which is the track length 
(from the TOC) for a CD track, the filename and ID3 tag if the recording is in an MP3 file, and 
the table of contents (TOC) data if the recording is on a CD. If the file containing the recording 
was produced by a client device operating according to the Invention, a unique ID will be 
extracted from the ID3 file, but initially it will be assumed that infomiation is not available. 

[0029] In an exemplary embodiment, the extracted infomiation is sent from client 1 10 to serwer 
120 to determine a possible identification of the unknown recording using at least one piece of 
the infomiation extracted flrom recording 100 and a database 130 of conrespondingly generated 
fingerprints for reference recordings. If text or a unique ID were extracted, an attempt is made 
to find a match. If a match is found using the text or unique ID. at least one algorittimically 
detenmined fingerprint is compared with the fingerprint(s) stored ifi the matching records to 
detenmine whether there is a single reference recording that matches the infomiation extracted 
from recording 100 with respective confidence levels for each item of infomiation that matches. 
If no matches can be found based on text and unique ID. an attempt is made to identify the a 
single reference recording using at least two of the fingerprints extracted fl^m recording 100. If 
a single reference recording is located using either method, preferably the duration of recording 
100 is compared with the duration of the single reference recording as a final validation step. 

[00301 Preferably related metadata is used for validation of the match obtained by fingerprint 
recognition. Like any recognition system fingerprinting can produce erroneous results. Without 
a validation component such an enror can propagate fliroughout the system and return 
enx)neous data to lai^e percentages of users. The use of validation criteria such as ti^ck length 
comparison enables the system to catch potential emors and flag tiiem for validation. 

[0031] A system according to ttie present invention preferably includes custom result reporting 
and flexible adminlsti^tive interiaces 130 to enable weighting of various identification methods 
and tile order of flieir engagement. Analysis of successful match rates for specific identification 
methods allows an administrator to manipulate tfie identifying criteria for each component to 
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maximize tlie identification probability. A system according to the present invention preferably 
incorporates usage data from over 28 million users utilizing the CDDB database via Gracenote 
Data Services division, to help guide results 140. 

[0032] The flexibility of a system acxx)rding to the present invention allows different 
configurations to be used for identifying recordings in different environments. An application 
that monitors streaming audio, for example, requires a very different system and solution 
architecture than one that identifies files in a peer-to-peer system, or one that identifies analog 
input. However the present invention can be configured for identification of recordings in each 
of these situations. 

[0033] A system according to the present invention maximizes identification while minimizing 
resource usage. The use of multiple identification methods ensures that more resource 
intensive methods, such as audio fingerprinting are employed only when necessary. The use of 
multiple audio fingerprinting technologies reduces data collision and covers any "blind spots" in 
a given audio fingerprint technology. The "blind spots" found in single source fingerprinting 
systems, are avoided by using multiple sources for different fingerprinting techniques. This also 
provides the ability to fine tune deployment for specific target applications. 

[0034] Preferably, fingerprints are obtained using multiple fingerprint recognition services using 
the method illustrated in Fig. 2. This increases the ability of the system to accurately recognize 
recordings of various types. 

[0035] As illustrated in Fig. 2, when unidentified (unknown) recording 100 is accessed by 
fingerprint extraction client 110, if possible conventional TOC/file recognition is performed by 
recognition system 210 and results 220 are returned to fingerprint client 110. Results 220 
include a unique identifier (TUID) that points into a master metadata database (not shown in 
Fig. 2), if the TUID is found. Recording 100 is also processed by fingerprint extractor 230 using 
at least one and preferably several diff^erent algorithmlcally derived fingerprint extraction 
systems to obtain fingerprint(s) which are stored in fingerprint/ID send cache 240. As described 
below in more detail, instructions are received regarding when fingerprint uploader 250 should 
send the fingerprints to fingerprint recognition server 120. 

[0036] In fingerprint recognition server 120, the fingerprints transmitted by fingerprint uploader 
250 are initially stored in fingerprint receive cache 260. The fingerprints then undergo 
fingerprint validation 270 using an algorithmic comparator that attempts to cross-con-elate 
fingerprints for a recording with fingerprints uploaded and extracted by different end users. If it 
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is found that the fingerprints are substantially similar, they would be validated. This is not the 
only method thafs available for validation, but serves as one example of a process that could be 
used to reject bad data. 

[0037] In this embodiment, fingerprints that are detemiined to be valid and related undergo 
stitching 280. For example, If fingerprints are taken from 30 second segments of the recording, 
the fingerprints are assembled into a continuous fingerprint stream. This could simplity recog- 
nition of segments of the recording. The resulting fingerprints are stored in fingerprint database 
290 associated with existing database 90 (Fig. 1 ). 

[0038] The CDDB database has in part been generated through user submissions to create a 
metadata database with over 12 million tracks and 900.000 albums as of mid-2002. This 
database contains both basic metadata (artist, album, and track names) as well as extended 
data (genre, label, etc.). 

[0039] A similar distributed collection method may be utilized in the creation of a waveform 
database using the system illustrated in Fig. 1. In the case where recording 100 is a raw audio 
waveform.^e.g., when a CD is encoded into another fonnat, such as an MPS file, client device 
110 obtains non-wavefomi data associated with recording 100 which is possessed by a user of 
database 90 and executes extraction algorithm(s) to extract fingeiprints from at least one 
portion of the recording. The fingerprints are then sent to sen/er 120 with a unique ID, 
preferably derived from the TOC of the CD. When the unique ID is available, i.e., , when a 
match is found in the database for the non-wavefonm data, server 1 20 is able to associate the 
appropriate metadata in database 90 and the fingerprint(s) with same level of accuracy as 
idenfificatlon of CDs by the existing database 90 which is provided for identification of 
recordings possessed by users. Fingerprints dynamically gathered In this manner may be sent 
to a fingerprint collection server (not shown in Fig, 1) which would accumulate fingerprints from 
authenticated clients, as described in more detail below, prior to storing the at least one 
fingerprint as Identifying infomiation for the recording. 

[0040] Multiple fingerprint gathering extractors can also be mn over a set of static wavefomis 
fl^om a commerdal encoder such as Loudeye or Muse. The challenge with this approach is 
associating the fingerprints with the appropriate metadata. The method described above 
enables audio fingerprints to be logically associated with parent records and associated back to 
the original audio source. In the prefen-ed embodiment, the unique ID provides difi'erentlation 
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between live and studio versions of the same song while simultaneously linking those records to 
the same artist and their respective albums. 

10041] Preferably sen^er(s) 120 store infomiation in a parallel record set that are linked with 
unique IDs. When client 110 asks sen/er 120 to recognize media (CD. digital audio file, video 
file) server 120 may also return a record about how fingerprints should be gathered for this 
particular CD. Thisjs called fee Gathering Instructions Record (GIR). The GIR may include a 
set of Instructions that the remote fingerprint gathering code follows. The record may be pre- 
computed in off hours or may be dynamically computed at the time of recognition. 

[0042] Server 120 may use information it knows about the popularity of a CD to drive decisions 
about gathering. Everything about a rare CD could be gathered, because the opportunity to get 
the fingerprints would not want to be missed (even if it was somewhat burdensome to the user). 
The opposite situation could be true for a very popular CD. The load may be distributed across 
many users so that they would not even notice that any woric for fingerprint gathering was 
occurring. 

[0043] The rules and procedures for building the GIR may be manual, automated and may 
change over time. They may also be applied uniquely to specific users, applications or 
geographic locations. ^ 

[0044] In one embodiment, the server dynamically gathers fingerprints by modifying the GIR to 
remove fingerprints that have been gathered previously. The fi^quency of updating GIRs may 
vary from instant to delays of days, weeks or months. Some example instructions that may be 
included in the GIR are: 

• A list of track and segments to be gathered and their priority. 

• A fingerprint generator algorithm to use. 

• Parameters that tell the fingerprint generator how to process the fingerprint, such as: 

Frequency of audio samples 

Bands of the frequency domain to process 

Resolution of the fingerprint 

Desired Quality of Audio 

• When to do the fingerprint gathering, such as 

Before encoding the track 

After encoding the track 

In parallel with encoding the track 
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• Instrudions for caching the fingerprint and when to transmit ft back to the server, such as 

Before encoding the track 

After encoding the track 

After the CD has been fully encoded 

When the communication channel back to the server is not busy 

When the next CD is looked up 

When a group of fingerprints Is ready for transmission 

• Instrudions to take CPU power into the process so as to not overioad the computer 

[0045] Preferably, the system attempts to improve the quality of the fingerprints during 
operation. Quality of the source signal, the parameters used for fingerprinting, along with 
improvements in the fingerprinting algorithms will resuft in a complex quality matrix that Is used 
by sender 120 to detemiine what fingerprints to gather if higher quality is available. An example 
of source quality is provided below: Preferably, database 90 or a similar database maintained 
by fingerprint collection sen^er(s) stores the source quality for fingerprints stored in the 
database, so that when a fingerprint from higher quality source is available, the fingerprint may 
be replaced. 



Source Quality Table 



Name 


Bit Rate 


Compression 


Enx)r Correction 


Quality Index 


CD Audio HEC 


44100kbps 


None 


Hardware 


1 


CD^Audio SEC 


44100kbps 


None 


Software 


2 


CD Audio 


44100kbps 


None 


None 


3 


CDR Audio 


44100kbps 


None 


None 


4 


CDR Made From MP3 


44100kbps 


mp3 


None 


5 


MPS File 


160kbps 


mp3 


None 


6 



[0046] Fingerprints dynamically gathered may contain infomnafion that helps validate quality. 
Infonnation such as emors while reading ft-om the media may be sent up to the fingerprint 
collector. The system may reject fingerprints that had high error rates from the source media. 

[0047] As noted above, instead of immediately storing a fingerprint, multiple fingerprints for a 
recording may be gathered in by a fingerprint collection sender prior to being added to the 
database. These fingerprints may be compared aigorithmically to determine their congelation. If 
correlation is not adequate then additional fingerprints may be gathered until adequate 
connelation is achieved and one of the fingerprints or a composite fingerprint is stored in tiie 
database. This prevents bad fingerprints flx>m becoming part of the database. 
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[0048] Stitching of the segmented fingerprints may be necessary since slight variations in timing 
could result in overlap of the fingerprints. Algorithmic stitching could result in a higher quality 
continuous fingerprint. Simple stitching appends segmented fingerprints in order of appearance 
in the recording. Complex stitching could involve scaling different qualities of fingerprints to the 
lowest common denominator and then appending them In order of their appearance in the 
recording. Preferably some fonm of mathematical fitting Is utilized if the fingerprint segmentation 
contains jitter, so that appending is a fu22y process rather simple addition of the datastream. 

[0049] One example of audio fingerprinting that can be used is described in the U.S. patent 
application entitled Automatic Identification of Sound Recordings, filed by Maxwell Wells et ai. 
on July 22. 2002 and incorporated herein by reference. However, any known algorithmically 
derived fingerprinting technique may be used, not only for digital audio, but also video, TV 
programs (both analog and digital) and DVDs. Appropriate Identifiers and recognition 
techniques will be used for the media to be recognized in a particular application. 

[0050] The present invention provides great flexibility and can be utilized for a wide variety of 
environments, including MPS recognition in a peer-to-peer environment, or identification of an 
audio stream for monitoring and reporting purposes. No otiier solution Is Icnown to use multiple 
recognition components; so it is Bie only solution that can be customized to meet tiie needs of 
any audio (or video) recognition application. 

[0051] Afunctional description for a deployment of the present invention in a peer-to-peer 
application will be described below with reference to Fig. 3, In this embodiment, audio files are 
identified before providing public access to them, to detemiine if the files are allowed in ttie 
system, a process known as "filter-in". 

[0052] Client device 110 (Fig. 1) extracts infonnation 310 (Fig. 3) from an audio file at the time 
of upload to server 120 (Fig. 1). The exti^acted information preferably includes non-wavefomi 
data, such as a unique ID, ID3 tag, filename text data, ti-ack duration, etc. and fingerprint(s) 
extracted from the recording and sent to server 120 for recognition. 

[0053] The initial match 320 is perfonned against ttie unique ID, if present. Use of Gracenote's 
TUID enables a match to be returned witii 99.9% accuracy. This is also \he least resource 
intensive recognition metiiod and can achieve very fast recognition rates. If the unique ID is 
present \he system moves to ttie validation stage. If no unique ID is present the system 
attempts identification using the next recognition mettiods 330. 
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100541 In this embodiment, text-based identification is tried next, using a metadata database, 
such as the Gracenote CDDB service which contains over 900,000 albums and over 12 million 
songs. Text matching utilizes available text, sudi as the filename, file path or text within the ID3 
tag for MP3 files, to provide a set of data from which to attempt recognition. If an acceptable 
match is returned, the system moves to the validation stage. If a successful match is not 
returned, the system attempts identification utilizing the next recognition method. 

[0055] The next step is fingerprint identification, in this case using audio fingerprints. The 
fingerprints from an unknown recording are compared to the fingerprints in database 90 for 
reference recordings, one fingerprint at a time (or in parallel using different processors for 
different fingerprints). Each fingerprinting technology retums a match and a level of confidence. 
If a single reference recording has acceptable confidence levels the system moves to the 
validation stage. If an unsuccessful match is retumed the system can, depending on the target 
application, ask the user for validation of the most likely result or it can return a "no match 
found" result. 

[0056] Validation is a key component to any successful recognition system. Preferably, key file 
attributes such as the duration of the receding, are used to validate that a file is what the 
recognition system says it is by comparing an extracted length of the unknown recording with a 
stored length of the single reference recording. 

[0057] Preferably heuristic and voting algorithms 340 are used to detemiine if a match is what 
the system says it is. This self-monitoring reduces the possibility that the system retums 
inaccurate data that pollutes the system. The heuristics may be manually controlled or 
algorithmically controlled to produce the best match. These heuristics may also be used to 
determine which recognition techniques to apply and in what sequence. 

[0058] The administrator of each application can determine the level of accuracy needed by 
each stage (or component) of the system, and therefore has explicit control in optimizing the 
system. For example, if a 90% aggregate match is required the system administrator can use 
administrative interfaces 130 to adjust the levels of acceptable return to 90% and a successful 
result will not be generated unless that threshold is met. The administrator can also set result 
levels for each component. For example, a 99% text match can be required but only an 85% 
audio fingerprint match. 
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[0059] Once a successful identification is returned the file will be retagged 350 with the unique 
ID allowing for population of the file with the con^ct ID throughout the system. As a result, 
future identification of the file will require the least resource intensive recognition method. 

[00601 The unique ID (TUID) assigned to the file is then matched 360 against a list 370 of 
TUIDs populated through the submission of Title/Artist pairs 370 by labels, publishers, and 
content owners of those files allowed in the system. In one embodiment, if the TUID is present 
in the database, the file is allowed to be shared, but if the TUID is not present in the database, 
the file is blocked. In another embodiment, if the TUID is present in the database, the file is 
blocked. Either of these embodiments could be applied to files recognized as they are 
accessed by a user, or transmitted from one computer to another. 

[0061] As illustrated in Fig. 4A, an embodiment of the present invention uses a plurality of 
related databases. Master metadata database 410 contains infomiation on title, artist/author 
name, owner name and date. Related databases include audio fingerprint database 430 and 
video fingerprint database 440 which fomi fingerprint database 290 (Fig. 2). Also included are 
track lengthyTOC database 450. text database 460, and hash ID database 470 and guaranteed 
unique ID database 480. 

c 

[0062] As illustrated in Fig. 4B. when unidentified (unknown) recording 100 is accessed by 
client device 110, information is extracted, including fingerprints 540, 550, metadata 560 and 
unique ID 570, if present. In addition, the duration 580 of the recording is detemiined and a 
numerical hash 590 is calculated. The extracted fingerprints are compared with fingerprints 
600, 610. Similarly, matching 620, 630, 640 is perfomned on the numerical hash, text and 
unique ID. If a reference recording is located, validation is performed by comparing the duration 
of unidentified recording 100 with the duration of the reference recording. Results 660-710 with 
a level of confidence for each method of comparison is supplied to result aggregator 730. 

[0063] If no reference recording is found 750 matching unidentified recording 100, the extracted 
infomiation 540-590 and results are stored in unrecognized holding bin 760 for periodic 
resubmission to recognition server 120 (Figs. 2 & 4B). In tiiis embodiment, if a reference 
recording is located 770 with a low aggregate confidence level, post recognition processing 780 
is perfonned by applying heuristics 790, or a manual review 810, e.g., by presenting one or 
more possible matches to the user and receiving the user's selection in response. The results 
of such user selections may be included in the heuristics stored in heuristics database 820. If 
post recognition processing 780 results in identification of a single reference recording or result 
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aggregator 730 outputs recx)gnized results 770 with a high aggregate confidence level, the hash 
ID is generated 810 and sent to hash database 480 and client device 110, so that the hash and 
unique ID (TUID) can be stored in the ID3 tag, if a file is being created. 

[0064] In one embodiment, the system teams by watching errors in repeated attempts at 
recognition of similar files to improve its results. It also may receive manual stimulus from users 
who indicate that there are errors in the results. This allows recognition to be continuously 
validated over time. For example a file could be recognized by a system according to the 
invention, then over time the system determines that recognition of that file was flawed, and 
indicates to an operator that there was something wrong. In another embodiment, the system 
determines what is wrong by monitoring non-fingerprint based data and changing the 
recognition results accordingly. 

[00651 The present invention can be utilized to identify any audio content for tracking purposes. 
Digital audio streams, analog inputs or local audio files, can all be tracked. Such a tracking 
system could be a server side tracking systenwJeployed at the point of audio delivery and 
integrated with a reporting, digital rights management (DRM) system, or rights payment system. 
If the audio content being tracked was from a non-participating third party a client version of the 
system may be deployed to monitor the content being distributed. In either case, multiple 
identification methods would be utilized to ensure the highest rate of accuracy. 

[0066] Utilizing waveform recognition as a digital rights management component Is possible, 
and can be deployed to compare user created digital audio files with lists of approved content. 
This enables a filter-in approach within a peer-to-peer file sharing architecture such as the one 
described above. 

[00671 Audio fingerprinting technologies can be used as an anti-piracy tool, and can be 
customized to the type of audio being investigated. In the case of pirated CDs, the Gracenote's 
CDDB CD service may be utilized to provide table of content (TOC) recognition to augment 
audio fingerprinting technologies. 

[0068] Identification is the enabling component to deliver value-added sendees. Without explicit 
knowledge of the content being distributed it is impossible to distribute value-added content and 
services that relates to that audio content. 

[0069] The many features and advantages of the invention are apparent from the detailed 
specification and, thus, it is Intended by the appended claims to cover all such features and 
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advantages of the invention that fall within the tnje spirit and scope of the Invention. Further, 
since numerous modifications and changes will readily occur to those skilled in the art, it is not 
desired to limit the Invention to the exact construction and operation illustrated and described, 
and accordingly all suitable modifications and equivalents may be resorted to, falling within the 
scope of the invention. For example the system and method have been described as using a 
unique identifier. However, a hashed identifier could be used instead. 
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CLAIMS 

What is claimed Is: 

1. A method of IdentHyIng recordlings, comprising: 

extracting infonmation about an unlcnown recording stored In media possessed by 
a user and at least one algorithmlcally detenmined fingerprint from at least one portion of the 
unknown recording; 

detennining a possible identification of the unknown recording using at least one 
piece of the infomiation extracted from the unknown recording and an identification database of 
conesponding information for reference recordings; and 

identifying the unknown recording when the possible Identification based on each 
of the at least one piece of the infomaation in combination with the at least one algorithmlcally 
determined fingerprint identifies a single reference recording with respective confidence levels. 

2. A method as recited in claim 1 , 

wherein the identification database Is maintained by a provider of identification 

services, and 

wherein said determining uses a unique identifier from the provider of 
identification services when the unique identifier is associated with the unknown recording and 
otherwise uses text associated with the unknown recording when text is associated with the 
unknown recording. 

3. A method as recited in daim 2, further comprising validating said identifying by 
comparing an extracted length of the unknown recording with a stored length of the single 
reference recording. 

4. A method as recited in daim 3. wherein the text assodated with the unknown 
recording includes a filename of the recording. 

5. A method as recited in daim 3, wherein the text assodated with the unknown 
recording indudes an IDS tag for the recording. 

6. A method as recited in claim 1, wherein the at least one algorithmlcally determined 
fingerprint is extracted from at least one of audio and video information in the at least one 
portion of the unknown recording. 
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7. A method as recited in daim 1 , 

wherein the at least one algorithmlcally determined fingerprint includes at least 
two fingerprints, and 

wherein said Identifying requires each of the at least two fingerprints to Identify 
the single reference recording with respective confidence levels. 

8. A method as recited In dalm 1, further comprising validating said Identifying by 
comparing an extracted length of the unknown recording with a stored length of the single 
reference recording. 

9. A method as recited in claim 8, further comprising: 

repeating said extracting, detennining and identifying for a plurality unknown 
recordings fiiom a plurality users; 

monitoring unsuccessful identifications of the unknown recordings; and 
detecting a possible ent)r in the identification database from a pattern of errors. 

10. A method as recited in claim 9, wherein said monitoring includes receiving 
infonnation from the users indicating that said identifying was incon^ct. 

11. A method as recited in claim 9, 

wherein said monitoring includes storing the at least one algorithmically 
determined fingerprint and an identifier of the single reference recording when said validating is 
not successful, and 

wherein said method further comprises indicating the possible error in the 
Identification database when substantially different fingerprints are stored for a single Identifier. 

12. A method as recited In daim 9, wherein said method further comprises indicating 
the possible en-or when the at least one algorithmically detemfilned fingerprint matches one of 
the reference recordings, but the unknown recording is assodated with IDS Tag information 
different from that of the one of the reference recordings. 

13. A method as recited in claim 9. further comprising connoting the possible error 
based on the infonmation extracted ft^om the unknown recordings. 
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14. A method as recited in daim 8, further comprising indicating a possible error in the 
identification database when the at least one algorithmically detemiined fingerprint is 
substantially similar to one of the reference recordings, but substantially different information is 
extrarted from the unknown recording, 

1 5. A method as recited in daim 1 , further comprising delivering related data for the 
unl<nown recording from a supplemental database to supplement data embedded within the 
unknown recording for display and user manipulation. 

16. A method of identifying recordings, comprising: 

extracting fingerprints from at least one portion of an unknown recording using a 
plurality of algorithms; 

detennlning a possible identification of the unknown recording using at least two 
of the fingerprints extracted from the unknown recording and at least one database of 
con-espondingly generated fingerprints for reference recordings; and 

identifying the unknown recording when the possible identification based on each 
of the fingerprints identifies a single reference recording with respective confidence levels. 

17. A method as redted in claim 16. wherein each fingerprint is extracted from at least 
one of audio and video infomriation in the at least one portion of the unknown recording. 

18. A method as recited in daim 16, further comprising validating said identifying by 
comparing a length of the unknown recording with a stored length of the single reference 
recording. 

19. A metfiod as recited in daim 18, 

wherein said extracting is perfonmed by dient equipment possessed by a user, 
wherein said detemiining. identifying and validating are perfonmed by at least one 

server under control of a provider of identification services, and 
wherein said mefliod further comprises: 

transmitting a unique identifier assodated with tiie single reference 

recording from flie at least one server to the client equipment after said validating is successful; 

and 
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associating the unique identifier witli the unlcnown recording in the client 

equipment 

20. A method as recited in daim 19, further comprising: 

comparing the unique identifier with a penmisslon list of stored identifiers in the at 
least one database; 

Indicating that the recording may be shared if there is a match for the unique 
identifier In the pennission list. 

21. A method as recited in claim 19, further comprising: 

comparing the unique identifier with a blocic list of stored identifiers in the at least 

one database; 

indicating that the recording may not be shared if there is a match for the unique 
identifier in the blocl< list. 

22. A method of obtaining reference infomiation stored in a database used to identify 
unknown recordings, comprising: 

4 

obtaining non-wavefonm data associated with a recording possessed by a user of 

the database for identification of recordings possessed by the user, 

extracting at least one fingerprint from at least one portion of the recording; and 
storing the at least one fingerprint as identifying infonmation for the recording. 

when a match is found in the database for the non-wavefomn data. 

23. A method as recited in daim 22, wherein the non-waveform data indicates the 
length of the recording. 

24. A method as recited in daim 23, wherein the recording is permanently stored on a 
removable medium and the non-waveform data is derived from table of contents data for the 
recording. 

25. A method as redted in claim 22, wherein the non-wavefomi data indudes text 
assodated with the recording. 
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26. A method as recited in daim 25, wherein the non-wavefonm data includes and 1D3 

tag. 

27. A method as recited in claim 26. 

wherein the IDS tag includes encoded infomnation, 

wherein the database is maintained by a provider of identification sendees and 
the encoded infomnation is generated under control of the provider of identification sendees, and 

wherein said method further comprises validating the non-wavefbrni data by 
decoding the encoded Infomnation prior to said storing of the identifying infomnation in the 
database. 

28. A method as recited in claim '25, wherein the non-waveform data includes a 
watermaric 

29. A method as recited in claim 25, wherein the non-waveform data includes media 
information regarding source media type. 

30. A method as recited in claim 29, wherein the media information Identifies the sourxje 
media type as CD-R. 

31 . A method as recited in claim 29, wherein the media infomnation identifies the source 
media type as CD-DA. 

32. A method as recited in claim 29, wherein the media infomnation identifies the source 
media type as a digital file. 

33. A method as recited in daim 29, wherein the media infomnation Identifies the source 
media type as a digital versatile disc. 

34. A method as redted in daim 25, wherein the text indudes a filename of the 
recording. 

35. A method as redted in claim 25, wherein the text indudes a title of the reooiding. 
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36. A method as recited in daim 25, wherein the text includes an artist name of a 
participant in creation of the recording. 

37. A method as recited in daim 25, wherein the text indudes an album name 
assodated with ttie recording. 

38. A method as redted in daim 22, wherein said obtaining and extracting are 
perfonned by dient equipment possessed by a plurality users for different copies of the 
recording and different users extract different fingerprints from the recording. 

39. A method as recited in daim 38. further comprising: 

maintaining the database on at least one server under control of a provider of 
identification services, and 

transmitting from the at least one server to the client equipment, extraction 
instaictions on which of the different fingerprints each of the client equipment extracts. 

40. A method as recited in claim 39, further comprising: 

transmitting the non-waveform data from the dient equipment to the at least one 

server, and 

selecting the extraction instructions by the at least one server for said 
transmitting to the dient equipment based on the non-waveform data. 

41 . A method as recited in claim 40, further comprising updating the extraction 
instructions based at least in part on frequency of receipt of the norvwaveform data for the 
recording. 

42. A method as redted in daim 40, wherein said selecting of the extraction instaictions 
is based at least in part on type of the dient equipment receiving the extraction insfructions. 

43. A method as redted In daim 40, wherein said selecting of the extraction instructions 
is based at least in part on geographical location of the dient equipment receiving the extraction 
instructions. 
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44. A method as recited in claim 40, wherein said selecting of the extraction instructions 
is based at least in part on software operating on the client equipment receiving tiie extraction 
instructions. 

45. A method as recited In daim 40, furflier comprising updating tfie extraction 
Instructions t>ased at least in part on number of users who have supplied flie identifying 
infomnation. 

46. A metiiod as recited in daim 40, wherein said selecting of the extraction instructions 
Is based at least In part on quality of tiie copies of tfie recording. 

47. A metiiod as redted in claim 40, further comprising transmitting the at least one 
fingerprint from the dient equipment to the at least one sender at a time specified by tiie 
exfaiaction insbtjdions. 

48. A method as recited in claim 47, further comprising storing the at least one 
fingerprint at tiie dient equipment until a spedfied number of fingerprints are ready for said 
transmitting. 

49. A method as recited in claim 47, wherein said transmitting of the at least one 
fingerprint occurs when a communication channel with tfie at least server is available. 

50. A method as redted in claim 47, wherein said transmitting of the at least one 
fingerprint for a first recording accessed by a piece of dient equipment occurs with said 
ti^nsmitting of the non-waveform data for a second recording accessed by the piece of client 
equipment. 

51 . A metiiod as redted in claim 47. 

wherein the recording Is permanentfy stored on a removable medium and the 
dient equipment generates at least one encoded file from tiie recording, and 

wherein said transmitting transmits tfie at least one fingerprint before encoding 

tiie recording. 

52. A method as recited in claim 47, 
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wherein the recording is permanently stored on a removable medium and the 
client equipment generates at least one encoded file from the recording, and 

wherein said transmitting transmits the at least one fingerprint after encoding one 
track of the removable medium. 

53. A method as recited in daim 47, 

wherein the recording is penmanently stored on a removable medium and the 
client equipment generates at least one encoded file from the recording, and 

wherein said transmitting transmits the at least one fingerprint after receiving an 
Indication that encoding of the removable medium has been completed. 

54. A method as recited in claim*22, wherein the database includes the Identifying 
infonnatlon for musical recordings. 

55. A method as recited in claim 22, wherein the database includes the identilying 
. infonnation for video recordings. 

56. A method as recited in daim 22, further comprising: 

detecting a quality of the at least one fingerprint; 

Identifying another copy of the recording using the at least one fingerprint; and 
replacing the at least one fingerprint with a higher quality fingerprint when the 
other copy of the recording produces the higher quality fingerprint. 

57. A method as recited in daim 56, wherein said detecting of the quality is based on an 
encoding technique used for the recording. 

58. A method as redted in daim 56, wherein said detecting of the quality Is based on a 
media type used to store the recording. 

59. A method as redted in daim 56, wherein said deteding of the quality is based on 
emor connection capability of user equipment accessing the recording. 
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60. A method as recited In claim 59, wherein said detecting of the quality assigns higher 
quality when hardware em)r conrection is used than when software enror conrecHon is by the 
user equipment 

61 . A method as recited in claim 56, wherein said detecting of the quality Is based on 
number of errors detected during said extracting of the fingerprint 

62. A method as recited in claim 22, 

wherein said obtaining and extracting are perfomried by client equipment 
possessed by a plurality users for different copies of the recording, and 
wherein said method further comprises: 

comparing the at l^st one fingerprint obtained from one of the users with 
the at least one fingerprint extracted from at least one other user and 

updating the at least one fingerprint in the database based on said 

comparing. 

63. A method as recited in claim 62, wherein said updating is performed after said 
comparing detennines that fingerprints from different users have a predetermined con-elation. 

64. A method as recited in claim 62, wherein said updating combines fingerprints from 
different users for storage in the database. 

•65. A system for Identifying recordings, comprising: 

an extraction unit to extract inforniatioaabout an unl<nown recording stored in 
media possessed by a user and at least one algorithmically detemiined fingerprint from at least 
one portion of the unknown recording; and 

an Identification unit, coupled to said extraction unit, to make a possible 
identification of the unknown recording using at least one piece of the Information extracted from 
the unknown recording and an identification database of corresponding infonnation for 
reference recordings, and to identify the unknown recording when the possible identification 
based on each of the at least one piece of the infonnation in combination with the at least one 
algorithmically detennined fingerprint identifies a single reference recording with respective 
confidence levels. 
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66. A system for identifying recordings, comprising: 

an extraction unit to extract fingerprints from at least one portion of an unknown 
recording using a plurality of algorithms; and 

an identification unit, coupled to said exbraction unit, to make a possible 
identification of the unknown recording using at least two of the fingerprints extraded from the 
unknown recording and at least one database of conespondingly generated fingerprints for 
reference recordings, and to identify the unknown recording when the possible identification 
based on each of the fingerprints identifies a single reference recording with respective 
confidence levels. 

67. A system for obtaining reference infonnation stored in a database used to identify 
unknown recordings, comprising: 

a receiving unit to obtain non-waveform data associated with a recording . 
possessed by a user of the database for identification of recordings possessed by the user, 

an extraction unit to extract at least one fingerprint from at least one portion of 
the recording; and 

a storage unit, coupled to said receiving unit and said extraction unit, to store the 
at least one fingerprint as identifying infonnation for the recording, when a match is found in the 
database for the non-waveform data. 
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